179 research outputs found

    Distributed-Pair Programming can work well and is not just Distributed Pair-Programming

    Full text link
    Background: Distributed Pair Programming can be performed via screensharing or via a distributed IDE. The latter offers the freedom of concurrent editing (which may be helpful or damaging) and has even more awareness deficits than screen sharing. Objective: Characterize how competent distributed pair programmers may handle this additional freedom and these additional awareness deficits and characterize the impacts on the pair programming process. Method: A revelatory case study, based on direct observation of a single, highly competent distributed pair of industrial software developers during a 3-day collaboration. We use recordings of these sessions and conceptualize the phenomena seen. Results: 1. Skilled pairs may bridge the awareness deficits without visible obstruction of the overall process. 2. Skilled pairs may use the additional editing freedom in a useful limited fashion, resulting in potentially better fluency of the process than local pair programming. Conclusion: When applied skillfully in an appropriate context, distributed-pair programming can (not will!) work at least as well as local pair programming

    Plagiarism in Take-home Exams: Help-seeking, Collaboration, and Systematic Cheating

    Get PDF
    Due to the increased enrollments in Computer Science education programs, institutions have sought ways to automate and streamline parts of course assessment in order to be able to invest more time in guiding students' work. This article presents a study of plagiarism behavior in an introductory programming course, where a traditional pen-and-paper exam was replaced with multiple take-home exams. The students who took the take-home exam enabled a software plugin that recorded their programming process. During an analysis of the students' submissions, potential plagiarism cases were highlighted, and students were invited to interviews. The interviews with the candidates for plagiarism highlighted three types of plagiarism behaviors: help-seeking, collaboration, and systematic cheating. Analysis of programming process traces indicates that parts of such behavior are detectable directly from programming process data.Peer reviewe

    LittleDarwin: a Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems

    Full text link
    Mutation testing is a well-studied method for increasing the quality of a test suite. We designed LittleDarwin as a mutation testing framework able to cope with large and complex Java software systems, while still being easily extensible with new experimental components. LittleDarwin addresses two existing problems in the domain of mutation testing: having a tool able to work within an industrial setting, and yet, be open to extension for cutting edge techniques provided by academia. LittleDarwin already offers higher-order mutation, null type mutants, mutant sampling, manual mutation, and mutant subsumption analysis. There is no tool today available with all these features that is able to work with typical industrial software systems.Comment: Pre-proceedings of the 7th IPM International Conference on Fundamentals of Software Engineerin

    Managing plagiarism in programming assignments with blended assessment and randomisation.

    Get PDF
    Plagiarism is a common concern for coursework in many situations, particularly where electronic solutions can be provided e.g. computer programs, and leads to unreliability of assessment. Written exams are often used to try to deal with this, and to increase reliability, but at the expense of validity. One solution, outlined in this paper, is to randomise the work that is set for students so that it is very unlikely that any two students will be working on exactly the same problem set. This also helps to address the issue of students trying to outsource their work by paying external people to complete their assignments for them. We examine the effectiveness of this approach and others (including blended assessment) by analysing the spread of similarity scores across four different introductory programming assignments to find the natural similarity i.e. the level of similarity that could reasonably occur without plagiarism. The results of the study indicate that divergent assessment (having more than one possible solution) as opposed to convergent assessment (only one solution) is the dominant factor in natural similarity. A key area for further work is to apply the analysis to a larger sample of programming assignments to better understand the impact of different features of the assignment design on natural similarity and hence the detection of plagiarism

    Open Science in Software Engineering

    Full text link
    Open science describes the movement of making any research artefact available to the public and includes, but is not limited to, open access, open data, and open source. While open science is becoming generally accepted as a norm in other scientific disciplines, in software engineering, we are still struggling in adapting open science to the particularities of our discipline, rendering progress in our scientific community cumbersome. In this chapter, we reflect upon the essentials in open science for software engineering including what open science is, why we should engage in it, and how we should do it. We particularly draw from our experiences made as conference chairs implementing open science initiatives and as researchers actively engaging in open science to critically discuss challenges and pitfalls, and to address more advanced topics such as how and under which conditions to share preprints, what infrastructure and licence model to cover, or how do it within the limitations of different reviewing models, such as double-blind reviewing. Our hope is to help establishing a common ground and to contribute to make open science a norm also in software engineering.Comment: Camera-Ready Version of a Chapter published in the book on Contemporary Empirical Methods in Software Engineering; fixed layout issue with side-note

    An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets.</p> <p>Results</p> <p>The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of <it>translators</it> required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data.</p> <p>Conclusions</p> <p>Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: <url>http://pypi.python.org/pypi/rpy2-bioconductor-extensions/</url></p

    A comparison of common programming languages used in bioinformatics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python.</p> <p>Results</p> <p>Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found.</p> <p>Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url></p> <p>Conclusion</p> <p>This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p

    AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community.</p> <p>Results</p> <p>This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment.</p> <p>Conclusions</p> <p>AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.</p

    Deep Learning Application in Security and Privacy - Theory and Practice:A Position Paper

    Get PDF
    Technology is shaping our lives in a multitude of ways. This is fuelled by a technology infrastructure, both legacy and state of the art, composed of a heterogeneous group of hardware, software, services and organisations. Such infrastructure faces a diverse range of challenges to its operations that include security, privacy, resilience, and quality of services. Among these, cybersecurity and privacy are taking the centre-stage, especially since the General Data Protection Regulation (GDPR) came into effect. Traditional security and privacy techniques are overstretched and adversarial actors have evolved to design exploitation techniques that circumvent protection. With the ever-increasing complexity of technology infrastructure, security and privacy-preservation specialists have started to look for adaptable and flexible protection methods that can evolve (potentially autonomously) as the adversarial actor changes its techniques. For this, Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) were put forward as saviours. In this paper, we look at the promises of AI, ML, and DL stated in academic and industrial literature and evaluate how realistic they are. We also put forward potential challenges a DL based security and privacy protection technique has to overcome. Finally, we conclude the paper with a discussion on what steps the DL and the security and privacy-preservation community have to take to ensure that DL is not just going to be hype, but an opportunity to build a secure, reliable, and trusted technology infrastructure on which we can rely on for so much in our lives
    corecore